perf: reuse searcher across batch queries in find_evidence.py#145
perf: reuse searcher across batch queries in find_evidence.py#145tarilabs wants to merge 2 commits intoredhat-documentation:mainfrom
Conversation
In batch mode, ensure_index() was called per query via retrieve_evidence(), redundantly reloading the SentenceTransformer model, reopening the Milvus DB, and rebuilding the entire BM25 index on every iteration (~2-6s overhead each). Call ensure_index() once before the loop and reuse the returned searcher for all queries within each batch invocation. This saves ~18-84s in the scope-req-audit step (~10-15 queries) and ~18-174s in the code-evidence step (~10-30 queries with two-pass retrieval). Single-query mode is unchanged — it still delegates to retrieve_evidence(). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: tarilabs <matteo.mortari@gmail.com>
WalkthroughBatch evidence retrieval was refactored to build and reuse a shared search index via ChangesBatch Retrieval Refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 10✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py`:
- Around line 133-134: The import ordering in find_evidence.py is unsorted and
causing lint failure: move the import of
claude_context.skills._index_manager.ensure_index so it appears before
claude_context.skills.evidence_retrieval.retrieve_evidence (i.e., ensure_index
import should precede retrieve_evidence), then re-run the project's import
sorter/formatter (or manually reorder the two import lines) to satisfy I001.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ad13bc82-68d5-4b48-823a-975e819d430b
📒 Files selected for processing (1)
plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py`:
- Line 44: The function _format_result declares an unused parameter
filter_paths; remove filter_paths from the function signature and update any
call sites that pass filter_paths (the invocation that builds the outer wrapper
dict) to stop supplying it—ensure the wrapper still adds filter_paths separately
as before; only change _format_result's signature and caller arguments, leaving
the body and other returned fields untouched.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 054acbb3-523e-41a8-bd8a-45faf40acd8f
📒 Files selected for processing (1)
plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py
| return [str((repo_root / p).resolve()) for p in filter_paths] | ||
|
|
||
|
|
||
| def _format_result(query, filter_paths, repo_path, index_info, results): |
There was a problem hiding this comment.
Remove unused filter_paths parameter.
The filter_paths parameter is declared but never referenced in the function body. At line 172, filter_paths is added to the outer wrapper dict separately, not via this function.
Proposed fix
-def _format_result(query, filter_paths, repo_path, index_info, results):
+def _format_result(query, repo_path, index_info, results):And update the call site at line 171:
- result = _format_result(query, filter_paths, repo_path, index_info, raw)
+ result = _format_result(query, repo_path, index_info, raw)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/docs-tools/skills/code-evidence/scripts/find_evidence.py` at line 44,
The function _format_result declares an unused parameter filter_paths; remove
filter_paths from the function signature and update any call sites that pass
filter_paths (the invocation that builds the outer wrapper dict) to stop
supplying it—ensure the wrapper still adds filter_paths separately as before;
only change _format_result's signature and caller arguments, leaving the body
and other returned fields untouched.
In batch mode, ensure_index() was called per query via retrieve_evidence(), redundantly reloading the SentenceTransformer model, reopening the Milvus DB, and rebuilding the entire BM25 index on every iteration (~2-6s overhead each).
Call ensure_index() once before the loop and reuse the returned searcher for all queries within each batch invocation. This saves ~18-84s in the scope-req-audit step (~10-15 queries) and ~18-174s in the code-evidence step (~10-30 queries with two-pass retrieval).
Single-query mode is unchanged — it still delegates to retrieve_evidence().
Summary by CodeRabbit